assistant-prefill attack love

assistant-prefill attack

Definitions

Sorry, no definitions found. Check out and contribute to the discussion of this word!

Etymologies

Sorry, no etymologies found.

Support

Help support Wordnik (and make this page ad-free) by adopting the word assistant-prefill attack.

Examples

  • Across several evaluations, we consistently observed that assistant–prefill attacks, wherein the model is prompted as if it has already started to say something harmful, are sometimes effective at eliciting harmful behavior.

    Highlights from the Claude 4 system prompt Simon Willison 2025

Comments

Log in or sign up to get involved in the conversation. It's quick and easy.